The Second Competition on Spatial Statistics for Large Datasets

نویسندگان

چکیده

In the last few decades, size of spatial and spatio-temporal datasets in many research areas has rapidly increased with development data collection technologies. As a result, classical statistical methods statistics are facing computational challenges. For example, kriging predictor geostatistics becomes prohibitive on traditional hardware architectures for large as it requires high computing power memory footprint when dealing dense matrix operations. Over years, various approximation have been proposed to address such issues, however, community lacks holistic process assess their efficiency. To provide fair assessment, 2021, we organized first competition datasets, generated by our ExaGeoStat software, asked participants report results estimation prediction. Thanks its widely acknowledged success at request participants, second 2022 focusing predictions more complex processes, including univariate nonstationary stationary space-time bivariate processes. this paper, describe detail generation procedure make valuable publicly available wider adoption. Then, review submitted from fourteen teams worldwide, analyze outcomes, performance each team.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Likelihoods for Large Spatial Datasets

Datasets in the fields of climate and environment are often very large and irregularly spaced. To model such datasets, the widely used Gaussian process models in spatial statistics face tremendous challenges due to the prohibitive computational burden. Various approximation methods have been introduced to reduce the computational cost. However, most of them rely on unrealistic assumptions of th...

متن کامل

Bayesian Modeling for Large Spatial Datasets.

We focus upon flexible Bayesian hierarchical models for scientific data available at geo-coded locations. Investigators are increasingly turning to spatial process models to analyze such datasets. These models are customarily estimated using Markov Chain Monte Carlo (MCMC) methods, which have become especially popular for spatial modeling, given their flexibility and power to fit models that wo...

متن کامل

Sparse Density Representations for Simultaneous Inference on Large Spatial Datasets

Large spatial datasets often represent a number of spatial point processes generated by distinct entities or classes of events. When crossed with covariates, such as discrete time buckets, this can quickly result in a data set with millions of individual density estimates. Applications that require simultaneous access to a substantial subset of these estimates become resource constrained when d...

متن کامل

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

This paper introduces new algorithms and data st.ruct,ures for quick rounting for machine learning dat.asets. We focus on t,he counting task of constructing contingent:. t.ables, but our approach is also applicahlc t.o counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptionsl t h c rosts of thesr operations ca,n he shown to be independent of the...

متن کامل

Cached Suucient Statistics for Eecient Machine Learning with Large Datasets

This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of data science

سال: 2022

ISSN: ['1680-743X', '1683-8602']

DOI: https://doi.org/10.6339/22-jds1076